Extension Architecture
This document explains the Browser Extension Architecture built with the WXT framework. It focuses on the three main entry points:
Background script for extension-wide operations and cross-tab coordination
Content script for page-level automation and DOM interaction
Side panel UI for user interaction and agent orchestration
It documents extension configuration, manifest setup, messaging architecture, component relationships, lifecycle management, and integration patterns with browser APIs. Security, permissions, and performance optimization strategies are also covered.
The extension is organized under the extension directory with WXT entrypoints and React-based UI components. Key areas:
Configuration: wxt.config.ts defines module usage, permissions, and host permissions
Background: background.ts handles messaging, tab management, and agent tool execution
Content: content.ts manages page-level automation and DOM interactions
Side Panel: React app mounted via shadow DOM with hooks for auth, tabs, and WebSocket
Utilities: websocket-client.ts, executeActions.ts, and shared parsing utilities
Diagram sources
Section sources
Background Script: Central orchestrator for messaging, tab state, and agent tool execution. Handles message routing for agent actions, tab operations, and Gemini requests.
Content Script: Page-level automation that injects or removes visual overlays and performs DOM actions (click, type, scroll) via injected scripts.
Side Panel UI: React application mounted in a shadow DOM, providing user controls, authentication, tab management, and agent execution with WebSocket integration.
Key responsibilities:
Messaging: bidirectional communication between UI, background, and content scripts
Permissions: activeTab, tabs, storage, scripting, identity, sidePanel, webNavigation, webRequest, cookies, bookmarks, history, clipboard, notifications, contextMenus, downloads
Cross-origin: host_permissions for <all_urls>
Section sources
The extension follows a layered architecture:
UI Layer: Side panel React app with hooks for auth and tab management
Control Layer: Background script managing messaging and cross-tab operations
Automation Layer: Content script performing DOM-level actions
Utility Layer: WebSocket client and action executor utilities
React App"] --> BG["Background Script
Messaging Hub"] BG --> CT["Content Script
DOM Automation"] UI --> WS["WebSocket Client"] UI --> AUTH["Auth Hook"] UI --> TABS["Tab Management Hook"] BG --> UTIL_EXE["Action Executor"] UI --> UTIL_WS["WebSocket Client"] subgraph "Browser APIs" RT["runtime"] TABS_API["tabs"] ST["storage"] ID["identity"] SCR["scripting"] NAV["webNavigation"] REQ["webRequest"] end BG --> RT BG --> TABS_API BG --> ST BG --> ID BG --> SCR BG --> NAV BG --> REQ
Diagram sources
Background Script#
Responsibilities:
Message routing for agent tool execution, tab activation/deactivation, tab queries, action execution, Gemini requests, and generated agent runs
Tab tracking via browser.tabs listeners and storage updates
Dynamic imports for external libraries (e.g., Gemini SDK)
Injection of content scripts and inter-tab messaging
Key flows:
Message listener routes incoming runtime messages to handlers
Tab management updates local storage for UI consumption
Action execution injects content scripts and forwards actions to content script
Diagram sources
Section sources
Content Script#
Responsibilities:
Optional creation/removal of visual AI frame overlays
DOM-level actions (click, type, scroll) via injected functions
Basic action parsing and execution helpers
Notes:
The current implementation focuses on DOM manipulation and does not actively listen for messages in the provided snippet
The commented code shows a previous approach to overlay injection and removal
Section sources
Side Panel UI#
Responsibilities:
Mounts React app in a shadow DOM
Provides authentication flow (Google OAuth and demo GitHub login)
Manages active tab and tab list
Integrates WebSocket client for agent execution and statistics
Executes agent commands and browser actions
Key integrations:
Shadow DOM mounting via WXT content script API
Authentication hook for OAuth and token refresh
Tab management hook for active tab and tab list
WebSocket client for agent execution and progress updates
Action executor for browser-level actions
Diagram sources
Section sources
Messaging System Architecture#
The messaging system connects the UI, background, and content layers:
UI sends commands to background via runtime.sendMessage
Background routes messages to appropriate handlers
Background injects content scripts and communicates with content script via tabs.sendMessage
Content script executes DOM actions and returns results
Diagram sources
Section sources
Component Relationships#
Side Panel App depends on hooks for authentication and tab management
AgentExecutor integrates with WebSocket client and action executor
Background script coordinates messaging and tab operations
Content script provides DOM-level automation
Diagram sources
Section sources
External dependencies include React, Socket.IO client, and Google Generative AI SDK. Internal dependencies are structured around hooks and utilities.
Diagram sources
Section sources
Minimize DOM operations: batch DOM queries and mutations in content script
Debounce UI updates: throttle progress updates and tab list refreshes
Lazy loading: defer heavy computations until needed (e.g., Gemini SDK dynamic import)
Efficient messaging: avoid excessive message traffic; coalesce updates
Memory cleanup: remove event listeners and unmount React roots when appropriate
WebSocket reconnection: configure retry policies and backoff strategies
Permissions: carefully review and limit permissions to those required for functionality
Host permissions: <all_urls> grants broad access; ensure CSP and content security are enforced
OAuth: validate redirect URIs and handle errors gracefully; store tokens securely in browser storage
Content script isolation: avoid exposing sensitive data; sanitize inputs before DOM manipulation
Cross-origin requests: validate and sanitize external API responses; handle rate limits and errors
WXT supports multiple browsers; ensure browser-specific APIs are handled consistently
Use browser polyfills or feature detection for APIs not universally available
Test manifest keys and permissions across Chrome, Firefox, and Edge
Validate content script injection and messaging behavior differences
Common issues and resolutions:
Messaging timeouts: verify message listener registration and ensure async responses are sent
Content script injection failures: confirm scripting permissions and correct file paths
Tab operations failing: check tabs permissions and active tab queries
WebSocket connectivity: verify URL configuration and network availability
Authentication errors: validate OAuth flow and token refresh logic
Section sources
The extension architecture leverages WXT’s entry points and React to deliver a cohesive browser automation experience. The background script centralizes messaging and coordination, the content script handles page-level automation, and the side panel UI provides user interaction and agent orchestration. Proper configuration, security hardening, and performance optimization are essential for robust cross-browser deployment.